In this notebook, I decided to focus on analyzing text data, specifically, I collected a vast dataset from Twitter known as "Customer Support on Twitter." This dataset comprises an extensive collection of over 3 million tweets and their corresponding replies from some of the most prominent brands on the platform.
import re
import time
import string
import ast
import random
from tqdm.notebook import tqdm
from IPython.display import Markdown
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from functools import partial
from umap import UMAP
from hdbscan import HDBSCAN
from bertopic import BERTopic
from nltk.corpus import stopwords
import nltk
import torch
from datasets import Dataset
import datasets
import evaluate
from sklearn.feature_extraction.text import CountVectorizer
from sentence_transformers import SentenceTransformer
from peft import LoraConfig, get_peft_model, TaskType, PeftModel, PeftConfig
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, TrainingArguments, Trainer, GenerationConfig
tqdm.pandas()
nltk.download('stopwords')
[nltk_data] Downloading package stopwords to /root/nltk_data... [nltk_data] Package stopwords is already up-to-date!
True
df = pd.read_csv("data/raw_data/twcs.csv")
df.head()
| tweet_id | author_id | inbound | created_at | text | response_tweet_id | in_response_to_tweet_id | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | sprintcare | False | Tue Oct 31 22:10:47 +0000 2017 | @115712 I understand. I would like to assist you. We would need to get you into a private secured link to further assist. | 2 | 3.0 |
| 1 | 2 | 115712 | True | Tue Oct 31 22:11:45 +0000 2017 | @sprintcare and how do you propose we do that | NaN | 1.0 |
| 2 | 3 | 115712 | True | Tue Oct 31 22:08:27 +0000 2017 | @sprintcare I have sent several private messages and no one is responding as usual | 1 | 4.0 |
| 3 | 4 | sprintcare | False | Tue Oct 31 21:54:49 +0000 2017 | @115712 Please send us a Private Message so that we can further assist you. Just click ‘Message’ at the top of your profile. | 3 | 5.0 |
| 4 | 5 | 115712 | True | Tue Oct 31 21:49:35 +0000 2017 | @sprintcare I did. | 4 | 6.0 |
df.describe()
| tweet_id | in_response_to_tweet_id | |
|---|---|---|
| count | 2.811774e+06 | 2.017439e+06 |
| mean | 1.504565e+06 | 1.463141e+06 |
| std | 8.616450e+05 | 8.665730e+05 |
| min | 1.000000e+00 | 1.000000e+00 |
| 25% | 7.601652e+05 | 7.155105e+05 |
| 50% | 1.507772e+06 | 1.439805e+06 |
| 75% | 2.253296e+06 | 2.220646e+06 |
| max | 2.987950e+06 | 2.987950e+06 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2811774 entries, 0 to 2811773 Data columns (total 7 columns): # Column Dtype --- ------ ----- 0 tweet_id int64 1 author_id object 2 inbound bool 3 created_at object 4 text object 5 response_tweet_id object 6 in_response_to_tweet_id float64 dtypes: bool(1), float64(1), int64(1), object(4) memory usage: 131.4+ MB
tweet_id: The unique ID for this tweet
author_id: The unique ID for this tweet author (anonymized for non-company users)
inbound: Whether or not the tweet was sent (inbound) to a company
created_at: When the tweet was created
text: The text content of the tweet
response_tweet_id: The tweet that responded to this one, if any
in_response_to_tweet_id: The tweet this tweet was in response to, if any
As a company experiences significant growth and an increase in the number of customers, the customer support team faces a challenge in handling customer issues promptly and efficiently. Traditionally, the support team has relied on chat interfaces or social media platforms to address customer inquiries. However, as the demand for support surges, merely adding more personnel might not be enough to maintain a quick response time.
To address this issue, some companies have explored the implementation of conversational AI to handle customer interactions. However, developing an effective conversational agent presents its own set of challenges. Such AI systems require an extensive amount of data to respond accurately, particularly in urgent situations where customers are desperately seeking help.
An alternative approach to expedite responses involves combining the expertise of human agents with AI assistance. By employing a hybrid model, where conventional agents are aided by AI-generated suggestions as multiple optiones to responses, the support team can engage with customers more swiftly. This collaboration streamlines the response process, allowing the team to address customer concerns in a timely manner and meet the growing demands of an expanding customer base. As a result, the company can deliver a more efficient and satisfactory customer support experience.
In this Notebook, we focus on customer service in order and delivery issues and aim to build an AI assistant to help customer support provide responses more efficiently.
autor_count = df.groupby('author_id').size().reset_index(name='count').sort_values(['count'], ascending=False)[:20]
fig = px.pie(autor_count, names=autor_count.author_id, values='count', title='Top 20 Company Names')
fig.show()
df['date'] = pd.to_datetime(df['created_at'], format='%a %b %d %H:%M:%S +0000 %Y')
date_counts = df['date'].dt.date.value_counts().reset_index()
date_counts.columns = ['date', 'count']
date_counts = date_counts.sort_values(by='date')
date_counts.quantile(0.90)
count 84.4 Name: 0.9, dtype: float64
fig = px.box(date_counts, y='count')
fig.update_layout(title='Date Distribution', xaxis_title='Count')
fig.show()
fig = px.line(date_counts, x='date', y='count', title='Number of Tweets by Date')
fig.show()
df = df[df['date'] > pd.to_datetime('2017-09-01')]
def clean_text(text):
# Replace URLs with "the web site"
text = re.sub(r'http\S+|www\S+|https\S+', 'the url', text)
# Remove words starting with "@"
text = re.sub(r'\B@\w+', '', text)
# Remove symbols and emojis
text = re.sub(r'[^\w\s,.:\'!?-]', '', text)
# Remove all punctuations except ",", ".", ":", "'", "?", "!","-"
punctuations_to_keep = ',.:\'!?-'
text = ''.join([char if char in string.ascii_letters + string.digits + punctuations_to_keep else ' ' for char in text])
# Remove non-ascii characters
text = ''.join([char for char in text if ord(char) < 128])
# Remove two letters coming after "-"
text = re.sub(r'\s*-\w{2}\b', '', text)
# Remove unnecessary blanks
text = re.sub(r'\s+', ' ', text)
# Make all words lowercase
text = text.lower()
return text.strip()
# Apply cleaning
df['text'] = df['text'].progress_apply(lambda x: clean_text(x))
df.head()
0%| | 0/2804129 [00:00<?, ?it/s]
| tweet_id | author_id | inbound | created_at | text | response_tweet_id | in_response_to_tweet_id | date | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | sprintcare | False | Tue Oct 31 22:10:47 +0000 2017 | i understand. i would like to assist you. we would need to get you into a private secured link to further assist. | 2 | 3.0 | 2017-10-31 22:10:47 |
| 1 | 2 | 115712 | True | Tue Oct 31 22:11:45 +0000 2017 | and how do you propose we do that | NaN | 1.0 | 2017-10-31 22:11:45 |
| 2 | 3 | 115712 | True | Tue Oct 31 22:08:27 +0000 2017 | i have sent several private messages and no one is responding as usual | 1 | 4.0 | 2017-10-31 22:08:27 |
| 3 | 4 | sprintcare | False | Tue Oct 31 21:54:49 +0000 2017 | please send us a private message so that we can further assist you. just click message at the top of your profile. | 3 | 5.0 | 2017-10-31 21:54:49 |
| 4 | 5 | 115712 | True | Tue Oct 31 21:49:35 +0000 2017 | i did. | 4 | 6.0 | 2017-10-31 21:49:35 |
response_tweet_id and in_response_to_tweet_id. As a result, in order to extract conversations between companies and customers on specific topics, I wrote a recursive function, which is demonstrated in the below section.The rows with a NaN value in the in_response_to_tweet_id column represent instances where customers initiated a conversation.
df_starting = df[df['in_response_to_tweet_id'].isna()]
df_starting.head()
| tweet_id | author_id | inbound | created_at | text | response_tweet_id | in_response_to_tweet_id | date | |
|---|---|---|---|---|---|---|---|---|
| 6 | 8 | 115712 | True | Tue Oct 31 21:45:10 +0000 2017 | is the worst customer service | 9,6,10 | NaN | 2017-10-31 21:45:10 |
| 12 | 18 | 115713 | True | Tue Oct 31 19:56:01 +0000 2017 | yall lie about your great connection. 5 bars l... | 17 | NaN | 2017-10-31 19:56:01 |
| 14 | 20 | 115715 | True | Tue Oct 31 22:03:34 +0000 2017 | whenever i contact customer support, they tell... | 19 | NaN | 2017-10-31 22:03:34 |
| 23 | 29 | 115716 | True | Tue Oct 31 22:01:35 +0000 2017 | actually that's a broken link you sent me and ... | 28 | NaN | 2017-10-31 22:01:35 |
| 25 | 31 | 115717 | True | Tue Oct 31 22:06:54 +0000 2017 | yo , your customer service reps are super nice... | 30 | NaN | 2017-10-31 22:06:54 |
Now, by iteratively looping through the df_starting dataset and tracking the "response_tweet_id" in df data in a recursive manner, we can successfully retrieve the entire conversations between customers and the support teams. Example provided below.
df = df.set_index('tweet_id')
def display_markdown(input_list):
markdown_str = "\n\n".join([">{}".format(item) for item in input_list])
display(Markdown(markdown_str))
def isNaN(num):
return num!= num
def create_conversations(df, conversation_id, index):
if conversation_id is None:
conversation_id = []
try:
row = df.loc[index]
conversation_id.append(index)
except KeyError:
return df.loc[conversation_id].text.values.tolist()
response = row.response_tweet_id
if not pd.isna(response):
next_index = ast.literal_eval(response)
if isinstance(next_index, tuple):
create_conversations(df, conversation_id, next_index[0])
else:
create_conversations(df , conversation_id, next_index)
return df.loc[conversation_id].text.values.tolist()
Example: I want to get the whole conversation which has been initaited by a customer in index 18.
conv_list = create_conversations(df=df, conversation_id=None, index=18)
display_markdown(conv_list)
yall lie about your great connection. 5 bars lte, still wont load something. smh.
h there! we'd definitely like to work with you on this, how long have you been experiencing this issue?
since i signed up with you....since day 1
we understand your concerns and we'd like for you to please send us a direct message, so that we can further assist you.
you gonna magically change your connectivity for me and my whole family ?
this is saddening to hear. please shoot us a dm, so that we can look into this for you.
Next, I will apply our recursive function to the initiated tweets, allowing us to obtain the complete conversations. It takes some time 🕛 so go and grab a coffee ☕
create_conversations_partial = partial(create_conversations, df, None)
all_conversation = df_starting['tweet_id'].progress_map(lambda x: create_conversations_partial(x))
0%| | 0/652770 [00:00<?, ?it/s]
# Saving and loading
# np.save("data/conversations/all_conversation.npy", all_conversation)
all_conversation = np.load("data/conversations/all_conversation.npy", allow_pickle=True)
# Example
display_markdown(all_conversation[0])
is the worst customer service
i would love the chance to review the account and provide assistance.
# Sampling 20% of all conversations
#all_conversation_sampled = np.random.choice(all_conversation, size=int(len(all_conversation) * 0.2), replace=False)
# Saving and Loading
#np.save("data/conversations/all_conversation_sampled.npy", pd.Series(all_conversation_sampled))
all_conversation_sampled = np.load("data/conversations/all_conversation_sampled.npy", allow_pickle=True)
# Example
display_markdown(all_conversation_sampled[10])
kindly assist reverse a transaction
please provide an m-pesa sms we check. in
all_conversation_sampled_flatten = [item for sublist in all_conversation_sampled for item in sublist]
len(all_conversation_sampled_flatten)
377393
# Select Embedding model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
# Configure the dimention reduction algorithm (I used UMAP)
umap_model = UMAP(n_neighbors=5, n_components=5)
# Configure the clustring algorithm (I used HDBSCAN)
hdbscan_model = HDBSCAN(min_cluster_size=256, min_samples=64,
gen_min_span_tree=True,
prediction_data=True)
It takes some time 🕛
stopwords = list(stopwords.words('english'))
# we add this to remove stopwords that can pollute topcs
vectorizer_model = CountVectorizer(ngram_range=(1, 2), stop_words=stopwords)
model_bert_topic = BERTopic(
umap_model=umap_model,
hdbscan_model=hdbscan_model,
embedding_model=embedding_model,
vectorizer_model=vectorizer_model,
top_n_words=5,
language='english',
calculate_probabilities=True,
verbose=True
)
topics, probs = model_bert_topic.fit_transform(all_conversation_sampled_flatten)
[nltk_data] Downloading package stopwords to /root/nltk_data... [nltk_data] Package stopwords is already up-to-date!
Batches: 0%| | 0/11794 [00:00<?, ?it/s]
2023-08-05 19:14:02,843 - BERTopic - Transformed documents to Embeddings 2023-08-05 19:22:54,702 - BERTopic - Reduced dimensionality 2023-08-05 19:36:16,535 - BERTopic - Clustered reduced embeddings
# Saving and Loading
# model_bert_topic.save("data/models/bert_topic/model", serialization="safetensors", save_ctfidf=True, save_embedding_model=embedding_model)
model_bert_topic = BERTopic.load("data/models/bert_topic/model")
model_bert_topic.get_topic_info()
| Topic | Count | Name | Representation | Representative_Docs | |
|---|---|---|---|---|---|
| 0 | -1 | 181183 | -1_url_please_us_help | [url, please, us, help, thanks] | NaN |
| 1 | 0 | 11890 | 0_dm_us dm_us_send us | [dm, us dm, us, send us, send] | NaN |
| 2 | 1 | 8891 | 1_driver_uber_ride_drivers | [driver, uber, ride, drivers, lyft] | NaN |
| 3 | 2 | 8817 | 2_de_que_en_la | [de, que, en, la, el] | NaN |
| 4 | 3 | 6222 | 3_spotify_music_songs_song | [spotify, music, songs, song, playlist] | NaN |
| ... | ... | ... | ... | ... | ... |
| 187 | 186 | 269 | 186_iphone_freezing_update_ios update | [iphone, freezing, update, ios update, ios] | NaN |
| 188 | 187 | 267 | 187_dell_laptop_warranty_inspiron | [dell, laptop, warranty, inspiron, laptops] | NaN |
| 189 | 188 | 266 | 188_aadhar_aadhaar_stream yet_check singles | [aadhar, aadhaar, stream yet, check singles, s... | NaN |
| 190 | 189 | 266 | 189_return_replacement_item_options url | [return, replacement, item, options url, options] | NaN |
| 191 | 190 | 262 | 190_today_pm_tomorrow_8pm | [today, pm, tomorrow, 8pm, day] | NaN |
192 rows × 5 columns
model_bert_topic.visualize_topics()
model_bert_topic.visualize_barchart(top_n_topics=12)
def get_all_docs_per_topic(topics):
topic_docs = {topic: [] for topic in set(topics)}
for topic, doc in zip(topics, all_conversation_sampled_flatten):
topic_docs[topic].append(doc)
return topic_docs
def get_all_conversation_by_topic(topic, topic_docs, all_conversation):
conversations = []
for doc in tqdm(topic_docs[topic]):
for conv in all_conversation_sampled:
if doc in conv and conv not in conversations:
conversations.append(conv)
return conversations
topics = model_bert_topic.topics_
topic_docs = get_all_docs_per_topic(topics)
delivery_conv = get_all_conversation_by_topic(
topic=8, topic_docs=topic_docs, all_conversation=all_conversation_sampled)
0%| | 0/4201 [00:00<?, ?it/s]
# The number of conversations discussing about the order and delivery service
len(delivery_conv)
3250
# Example
display_markdown(delivery_conv[500])
are you gone mad.i have received no package still.but your sms says it is done.keep any system online to inform regarding it. the url
sorry about the trouble with the delivery. please report this to our support team here : the url as
This section is divided into two parts:
1) Evaluating the performance of a pre-trained Generative Model (specifically Flan-T5) in simulating a customer support team member's responses through in-context learning.
2) Enhancing the model's alignment with the given topic by fine-tuning it.
It is important to consider the following points:
1) Due to limited computing resources, a smaller version of the model with fewer parameters was used.
2) While selecting the model, extensive investigation was not conducted. Flan-T5 was chosen as it is known for its effectiveness in question answering tasks, as recommended in the literature.
3) Fine-tuning the model requires significant memory resources (over 16GB). The fine-tuning process and source code are discussed in this notebook, but actual execution was carried out on AWS SageMaker to obtain the results.If you have more than 16GB RAM in your local machine, give it a try 😊
I am using just the first comment and its corresponding response in each conversation.
def split_comment_response(conversations):
comments = []
responses = []
for conv in conversations:
if len(conv) > 1:
comments.append(conv[0])
responses.append(conv[1])
else:
continue
return comments, responses
comments , responses = split_comment_response(delivery_conv)
assert len(comments) == len(responses)
print(len(comments))
3227
display_markdown([comments[188]])
is their anyway to track the status of a delivery??
display_markdown([responses[188]])
you can track your orders here: the url tn
model_name='google/flan-t5-base'
original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Zero shot prompt
def generate_prompt(comment):
template = f"""As a member of the customer support team, you must use the following guidelines to respond to customers who have expressed complaints or askes a question about the service they received. Your task is to generate a helpful and empathetic response to address the customer's concerns effectively.
Please ensure that the generated response follows these guidelines:
1. Start the response with a polite and courteous greeting to the customer.
2. If there is a complaint, express empathy and understanding towards the customer's complaint.
3. If there is a complaint, provide a sincere apology for any inconvenience caused by the service issue.
4. If there is a complaint, provide a sincere apology for any inconvenience caused by the service issue.
5. If there is a question, provide the required information to answer the question.
6. Clearly address the specific points raised in the customer's complaint.
7. Explain the steps or actions being taken to resolve the issue.
8. Offer any necessary compensation or remedies, if applicable.
9. Encourage the customer to reach out for further assistance if needed.
10. End the response with another polite expression of gratitude and willingness to assist.
Customer's Complaint: {comment}
Your Generated Response:
"""
return template
# Few shot prompt
def generate_prompt_few_shot(comment):
template = f"""As a member of the customer support team, you must use the following guidelines to respond to customers who have expressed complaints or askes a question about the service they received. Your task is to generate a helpful and empathetic response to address the customer's concerns effectively.
Please ensure that the generated response follows these guidelines:
1. Start the response with a polite and courteous greeting to the customer.
2. If there is a complaint, express empathy and understanding towards the customer's complaint.
3. If there is a complaint, provide a sincere apology for any inconvenience caused by the service issue.
4. If there is a complaint, provide a sincere apology for any inconvenience caused by the service issue.
5. If there is a question, provide the required information to answer the question.
6. Clearly address the specific points raised in the customer's complaint.
7. Explain the steps or actions being taken to resolve the issue.
8. Offer any necessary compensation or remedies, if applicable.
9. Encourage the customer to reach out for further assistance if needed.
10. End the response with another polite expression of gratitude and willingness to assist.
Example 1:
Customer's Expression: I contacted seller and was rudely hung up on several times scam.
Your Response: I'm sorry for the troubles with your order! have you tried to contact us here?
Example 2:
Customer's Expression: Amazonhelp got the prime trial, but didn't give us discount on one item.
Your Response: Prime offers free shipping on items shipped by amazon. what discount are you referring to?
Example 3:
Customer's Expression: Argoshelpers hi there i pre ordered the xbox one x i'm enquiring if i will be receiving it today?
Your Generated Response: Can i have the order number and your full address please and i will look into it.
Customer's Expression: {comment}
Your Response:
"""
return template
def split_data(inputs, labels, test_size=25):
data = list(zip(inputs, labels))
# Shuffle the data randomly
random.shuffle(data)
# Split into test and train sets
test_data = data[:test_size]
train_data = data[test_size:]
test_inputs, test_labels = zip(*test_data)
train_inputs, train_labels = zip(*train_data)
return train_inputs, train_labels, test_inputs, test_labels
train_comments, train_responses, test_comments, test_responses = split_data(comments, responses)
# np.save('data/test_train/test_comments.npy', test_comments)
# np.save('data/test_train/train_comments.npy', train_comments)
# np.save('data/test_train/test_responses.npy', test_responses)
# np.save('data/test_train/train_responses.npy', train_responses)
test_comments = np.load('data/test_train/test_comments.npy')
train_comments = np.load('data/test_train/train_comments.npy')
test_responses = np.load('data/test_train/test_responses.npy')
train_responses = np.load('data/test_train/train_responses.npy')
print("number_of_train_data: {}".format(len(train_comments)))
print("number_of_test_data: {}".format(len(test_comments)))
number_of_train_data: 3202 number_of_test_data: 25
def generate_respond(comment, model, tokenizer):
prompt = generate_prompt_few_shot(comment)
input_ids = tokenizer(prompt, return_tensors='pt').input_ids
output = tokenizer.decode(
model.generate(
input_ids=input_ids,
max_new_tokens=500,
do_sample=True,
temperature= 0.7)[0],
skip_special_tokens=True
)
return output
results = {}
for comment, response in tqdm(zip(test_comments, test_responses)):
output = generate_respond(comment, original_model, tokenizer)
results[comment] = {"human": response, "original_model": output}
0it [00:00, ?it/s]
pd.set_option('display.max_colwidth', 500)
original_model_reponse = pd.DataFrame.from_dict(results).T.reset_index()
original_model_reponse = original_model_reponse.rename(columns={"index":"comment"})
original_model_reponse
| comment | human | original_model | |
|---|---|---|---|
| 0 | order says it was delivered last night at 20.45 however at this time i was on a call with amazon customer service. still no delivery | i'm sorry you haven't received your package yet. when you spoke with us, what options did we provide? ay | I have sent you the order number and your order number, but I wasn't able to find it. I have tried searching and contacting amazon customer service, but the package has not been delivered to my address. |
| 1 | my parcel hasn't arrived and the tracking message is poor 'probably arrived as we expected to be there by now'?? vague? | were there any emails sent to you regarding this order? if yes, did it advise of any delays? ma | What would you like to know about the problem with your order? |
| 2 | still waiting for my copy of the new book despite pre ordering. amazon need to get their shit together | hi nick, were you provided with a delivery date? tp | I'm sorry to hear that. I'm concerned about the products you're requesting. You can expect to receive your order in a few days. |
| 3 | - delivery notices showing what was delivered on my kids kindles is a terrible idea. unfortunately, the team in india cannot help me turn off delivery notifications...on my 4th person and probably further away from resolution | oh my! do you have the amazon app on the device? if so, have you tried turning those notifcations off? let us know, we want to help! mt | I know you're not making good use of that service. How do you feel about the inconvenience? |
| 4 | dear . sick of ordering items to find, after paying, they are coming from china and take weeks!! | so sorry about this. we do aim to deliver by the delivery date given at the time the order was placed. km | I understand, but what I do want to do is to help you resolve your issue. |
| 5 | can you explain why my delivery was thrown over my back gate open!?there was no note left either to say where it wasdelivered. | i'm so sorry it was delivered like this! was any portion of your order damaged or missing? jr | Thank you so much for the response. I hope you're all okay. |
| 6 | please can you look into this order: 206-9303243-6219526 i paid for next day delivery and still not received!! i would like 12 | we do not have access to account information on this platform, claire. what date were you given for the delivery of this order? is this item fulfilled by amazon or a seller?sm | I'm sorry to hear that. |
| 7 | tracking id q54023807103 if it was out for delivery yesterday why do i have to wait until tuesday? | hi, sorry to hear that, what was the estimated delivery date stated on the order confirmation email? jj | Your order was shipped yesterday, did you receive the package? |
| 8 | experience on-time correct order missing item delivered to door said he couldn't come in bc of other deliveries pleasant courier bonus | sorry to hear about your experience, please send us a note here: the url and we'll look into this further. | We were able to correct the issue in the delivery times. |
| 9 | you missed delivering a package to my work address. i dont know how but can you hold my package at the facility so i can pick it up tonight? | please dm us details of your concern or compliment include your tracking and phone number. don't forget to provide your shipping address. lg the url | I can put in the information on your account. |
| 10 | hey my package is literally right next door. less than 5 miles away. why would it take two days to deliver that??? the url | our team would like to research more, please dm the tracking number, address, and telephone number. tv the url | That is a problem. What is the best way to address this issue? |
| 11 | so after a 3rd terrible experience with deliveries i will be taking my custom elsewhere from now on poorservice disappointed | we'd love to help out in any way we can! without sharing accountpersonal details, can you please tell us what happened? wj | I have tried this store twice now, both times I was not satisfied. |
| 12 | pathetic service by , order should have delivered by 28 oct but no body including dnt know whr my order is. superannoyed loss paying extra money to stay at delivery address. order no: 204-4348609-9182765 | i'm sorry for the frustrating experience! did our customer service set any expectations for the parceldelivery? mj | Sorry to hear that. I can help you on your request. |
| 13 | this is the 2nd time my prime membership let me down. placed order on 09112017 and still not delivered! amazonprime the url | oh no! i'm sorry that your packages have been delayed. product availability, severe weather, and carrier capacity can all affect your delivery date. please reach out to us here for further assistance: the url sp | What's up with this? |
| 14 | my package has been in china since the 29th and its suppose to be here in the usa by the 3rd? whats going on??? | please dm us details of your concern or compliment include your tracking and phone number. we're here to help. lg the url | I will try to pick up your package from china by the 3rd. |
| 15 | package said it will be delivered by end of day it's already 930pm pacific time still no package what's going on? | i understand how upsetting this must be. we are working to process and deliver packages as best we can. i do apologize for any inconvenience this delay has caused. ea the url | Wait for your package to arrive and you'll be able to get it. |
| 16 | has anyone noticed 'guaranteed delivery' dates have been slipping by a lot lately? ominous holiday troubles ahead? | i'm sorry to hear this! have you reported these late orders to our customer support team? if so, what information was provided? ez | Oh no, not a lot! I've asked for a new item and it says 'guaranteed delivery'. Could you please tell me what are the reasons? |
| 17 | dear , you should highly reconsider sending my prime orders through . nothing is ever delivered in two days... sincerely, where the fuck is my package | thanks for reaching out to us. what was the delivery date we gave you for this order?sm | That's okay, thank you for choosing Amazon. |
| 18 | i've been waiting in all day for a delivery, i've just checked on it and it says an attempt was made..uhh no there wasn't | oh no! let's try reaching out to the courier to get some additional information: the url fd | Thank you. I wish I could help you. |
| 19 | hi there... it's embarrassing that i filed a return 17th due to defective product.. however i am still waiting... | we're sorry for the delay, yogesh. was a pick up scheduled for your product? if yes, when was it due? hk | You're welcome. Have a nice day! |
| 20 | amazon emailed me that my laptop charger delivery date has been delayed. i ordered it sept 26th with two day shipping. | i'm sorry for the delay! which carrier is assigned to your order? does the tracking on their website give any clarity? mv | Thank you. I hope you are fine. |
| 21 | why pay for prime and not get your package in 2 days?? come on.. | two-day shipping refers to transit time, in business days, once shipped. have we missed your confirmed delivery date? am | Sorry for the inconvenience. |
| 22 | hi there, can someone else collect my dpd parcel in your winterstoke road store if they take my id and reference number? | hi meg, that shouldn't be a problem, but to double check, i'd recommend giving the store a call directly on 0117 966 3064. danny | Thank you for your help. |
| 23 | how do i get a refund on shipping? ordered overnight shipping and it took an extra few days! thanks for your help! | we're here to help, sorry for the wait! have we missed our expected delivery date provided here: the url ? please, let us know! je | Yes, I am trying to get refunds on your order. Can you email me your refund and receipt? |
| 24 | had e-mail from amazonshipping attempted to deliver and failed. sitting in my office next to the front door! no delivery attempt! | i'm sorry to hear about the problem with this delivery! for further assistance, please use this: the url th | Thanks! |
model_name='google/flan-t5-base'
original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
conversation_dataset = {}
conversation_dataset['comments'] = train_comments
conversation_dataset['responses'] = train_responses
dataset = Dataset.from_dict(conversation_dataset)
def tokenize_function(example):
prompt = [generate_prompt_few_shot(comment) for comment in example['comments']]
example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
example['labels'] = tokenizer(example['responses'], padding="max_length", truncation=True, return_tensors="pt").input_ids
return example
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['comments', 'responses',])
Map: 0%| | 0/3202 [00:00<?, ? examples/s]
# Saving and Loading
#tokenized_datasets.save_to_disk("data/fine_tunning_data/tokenized_datasets")
tokenized_datasets = datasets.load_from_disk("data/fine_tunning_data/tokenized_datasets")
Saving the dataset (0/1 shards): 0%| | 0/3202 [00:00<?, ? examples/s]
tokenized_datasets
Dataset({
features: ['input_ids', 'labels'],
num_rows: 3202
})
lora_config = LoraConfig(
r=256, # Rank
lora_alpha=32,
target_modules=["q", "v"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)
peft_model = get_peft_model(original_model, lora_config)
def print_number_of_trainable_model_parameters(model):
trainable_model_params = 0
all_model_params = 0
for _, param in model.named_parameters():
all_model_params += param.numel()
if param.requires_grad:
trainable_model_params += param.numel()
return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"
print(print_number_of_trainable_model_parameters(peft_model))
trainable model parameters: 28311552 all model parameters: 275889408 percentage of trainable model parameters: 10.26%
output_dir = f'data/models/fine_tunned_flan_t5/comments-response-training-{str(int(time.time()))}'
peft_training_args = TrainingArguments(
output_dir=output_dir,
auto_find_batch_size=True,
learning_rate=1e-3, # Higher learning rate than full fine-tuning.
num_train_epochs=10,
logging_steps=1,
max_steps=10
)
peft_trainer = Trainer(
model=peft_model,
args=peft_training_args,
train_dataset=tokenized_datasets,
)
peft_trainer.train()
| Step | Training Loss |
|---|---|
| 1 | 47.500000 |
| 2 | 46.500000 |
| 3 | 43.750000 |
| 4 | 39.250000 |
| 5 | 36.750000 |
| 6 | 34.750000 |
| 7 | 33.500000 |
| 8 | 32.250000 |
| 9 | 31.500000 |
| 10 | 30.875000 |
TrainOutput(global_step=10, training_loss=37.6625, metrics={'train_runtime': 673.558, 'train_samples_per_second': 0.119, 'train_steps_per_second': 0.015, 'total_flos': 58259511705600.0, 'train_loss': 37.6625, 'epoch': 0.02})
peft_model_path="data/models/fine_tunned_flan_t5/peft-comment-response-checkpoint-local"
#peft_trainer.model.save_pretrained(peft_model_path)
#tokenizer.save_pretrained(peft_model_path)
('data/models/fine_tunned_flan_t5/peft-comment-response-checkpoint-local/tokenizer_config.json',
'data/models/fine_tunned_flan_t5/peft-comment-response-checkpoint-local/special_tokens_map.json',
'data/models/fine_tunned_flan_t5/peft-comment-response-checkpoint-local/spiece.model',
'data/models/fine_tunned_flan_t5/peft-comment-response-checkpoint-local/added_tokens.json',
'data/models/fine_tunned_flan_t5/peft-comment-response-checkpoint-local/tokenizer.json')
peft_model_base = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base")
peft_model = PeftModel.from_pretrained(peft_model_base,
'data/models/fine_tunned_flan_t5/peft-comment-response-checkpoint-local/',
torch_dtype=torch.bfloat16,
is_trainable=False)
print(print_number_of_trainable_model_parameters(peft_model))
trainable model parameters: 0 all model parameters: 261733632 percentage of trainable model parameters: 0.00%
def generate_respond(comment, model, tokenizer):
prompt = generate_prompt_few_shot(comment)
input_ids = tokenizer(prompt, return_tensors='pt').input_ids
output = tokenizer.decode(
model.generate(
input_ids = input_ids,
generation_config=GenerationConfig(max_new_tokens=500,
do_sample=True,
temperature= 0.7),
)[0],
skip_special_tokens=True
)
return output
peft_results = {}
for comment, response in tqdm(zip(test_comments, test_responses)):
output = generate_respond(comment, peft_model, tokenizer)
peft_results[comment] = {"human": response, "peft_model": output}
0it [00:00, ?it/s]
pd.set_option('display.max_colwidth', 500)
peft_model_response = pd.DataFrame.from_dict(peft_results).T.reset_index()
peft_model_response = peft_model_response.rename(columns={"index":"comment"})
all_models_response = pd.merge(original_model_reponse, peft_model_response, on=['comment', 'human'])
all_models_response
| comment | human | original_model | peft_model | |
|---|---|---|---|---|
| 0 | order says it was delivered last night at 20.45 however at this time i was on a call with amazon customer service. still no delivery | i'm sorry you haven't received your package yet. when you spoke with us, what options did we provide? ay | I have sent you the order number and your order number, but I wasn't able to find it. I have tried searching and contacting amazon customer service, but the package has not been delivered to my address. | You are welcome to email us at support@amazon.com. If you have any questions or concerns, please feel free to contact us. |
| 1 | my parcel hasn't arrived and the tracking message is poor 'probably arrived as we expected to be there by now'?? vague? | were there any emails sent to you regarding this order? if yes, did it advise of any delays? ma | What would you like to know about the problem with your order? | I'm sorry to hear that. I'm also going to try to get your order back from the seller so that we can make sure that the package is safe. |
| 2 | still waiting for my copy of the new book despite pre ordering. amazon need to get their shit together | hi nick, were you provided with a delivery date? tp | I'm sorry to hear that. I'm concerned about the products you're requesting. You can expect to receive your order in a few days. | Thank you so much for the help! I am really sorry I still cannot get my book. I need to find a publisher! |
| 3 | - delivery notices showing what was delivered on my kids kindles is a terrible idea. unfortunately, the team in india cannot help me turn off delivery notifications...on my 4th person and probably further away from resolution | oh my! do you have the amazon app on the device? if so, have you tried turning those notifcations off? let us know, we want to help! mt | I know you're not making good use of that service. How do you feel about the inconvenience? | The delivery notifications are not working on my child's kindles. I will contact your sales rep to help. |
| 4 | dear . sick of ordering items to find, after paying, they are coming from china and take weeks!! | so sorry about this. we do aim to deliver by the delivery date given at the time the order was placed. km | I understand, but what I do want to do is to help you resolve your issue. | i'm sorry i'm not here to help. |
| 5 | can you explain why my delivery was thrown over my back gate open!?there was no note left either to say where it wasdelivered. | i'm so sorry it was delivered like this! was any portion of your order damaged or missing? jr | Thank you so much for the response. I hope you're all okay. | Thank you for your assistance. |
| 6 | please can you look into this order: 206-9303243-6219526 i paid for next day delivery and still not received!! i would like 12 | we do not have access to account information on this platform, claire. what date were you given for the delivery of this order? is this item fulfilled by amazon or a seller?sm | I'm sorry to hear that. | Thank you. |
| 7 | tracking id q54023807103 if it was out for delivery yesterday why do i have to wait until tuesday? | hi, sorry to hear that, what was the estimated delivery date stated on the order confirmation email? jj | Your order was shipped yesterday, did you receive the package? | It was out for delivery yesterday but you can still pick up the order one day later. |
| 8 | experience on-time correct order missing item delivered to door said he couldn't come in bc of other deliveries pleasant courier bonus | sorry to hear about your experience, please send us a note here: the url and we'll look into this further. | We were able to correct the issue in the delivery times. | i will contact you later to make sure your complaint is dealt with properly. |
| 9 | you missed delivering a package to my work address. i dont know how but can you hold my package at the facility so i can pick it up tonight? | please dm us details of your concern or compliment include your tracking and phone number. don't forget to provide your shipping address. lg the url | I can put in the information on your account. | I heard it was a problem for you as well. Is there anything else I can do for you? |
| 10 | hey my package is literally right next door. less than 5 miles away. why would it take two days to deliver that??? the url | our team would like to research more, please dm the tracking number, address, and telephone number. tv the url | That is a problem. What is the best way to address this issue? | Sorry it took so long! What was the delay? |
| 11 | so after a 3rd terrible experience with deliveries i will be taking my custom elsewhere from now on poorservice disappointed | we'd love to help out in any way we can! without sharing accountpersonal details, can you please tell us what happened? wj | I have tried this store twice now, both times I was not satisfied. | For your shipping, I would recommend you to take your order from the online delivery provider. |
| 12 | pathetic service by , order should have delivered by 28 oct but no body including dnt know whr my order is. superannoyed loss paying extra money to stay at delivery address. order no: 204-4348609-9182765 | i'm sorry for the frustrating experience! did our customer service set any expectations for the parceldelivery? mj | Sorry to hear that. I can help you on your request. | You have to try again. |
| 13 | this is the 2nd time my prime membership let me down. placed order on 09112017 and still not delivered! amazonprime the url | oh no! i'm sorry that your packages have been delayed. product availability, severe weather, and carrier capacity can all affect your delivery date. please reach out to us here for further assistance: the url sp | What's up with this? | What is the problem? |
| 14 | my package has been in china since the 29th and its suppose to be here in the usa by the 3rd? whats going on??? | please dm us details of your concern or compliment include your tracking and phone number. we're here to help. lg the url | I will try to pick up your package from china by the 3rd. | I'm sorry to hear that. |
| 15 | package said it will be delivered by end of day it's already 930pm pacific time still no package what's going on? | i understand how upsetting this must be. we are working to process and deliver packages as best we can. i do apologize for any inconvenience this delay has caused. ea the url | Wait for your package to arrive and you'll be able to get it. | I have to be clear about the package. Can you confirm that this was our fault? |
| 16 | has anyone noticed 'guaranteed delivery' dates have been slipping by a lot lately? ominous holiday troubles ahead? | i'm sorry to hear this! have you reported these late orders to our customer support team? if so, what information was provided? ez | Oh no, not a lot! I've asked for a new item and it says 'guaranteed delivery'. Could you please tell me what are the reasons? | I'm sorry to hear that. My phone is crashing, and I can't get it fixed. |
| 17 | dear , you should highly reconsider sending my prime orders through . nothing is ever delivered in two days... sincerely, where the fuck is my package | thanks for reaching out to us. what was the delivery date we gave you for this order?sm | That's okay, thank you for choosing Amazon. | You are correct. I realize that you have very limited shipping options, but I should be able to provide you with more options. |
| 18 | i've been waiting in all day for a delivery, i've just checked on it and it says an attempt was made..uhh no there wasn't | oh no! let's try reaching out to the courier to get some additional information: the url fd | Thank you. I wish I could help you. | Sure, there must be a problem with the delivery. |
| 19 | hi there... it's embarrassing that i filed a return 17th due to defective product.. however i am still waiting... | we're sorry for the delay, yogesh. was a pick up scheduled for your product? if yes, when was it due? hk | You're welcome. Have a nice day! | I can't help but feel bad for you that you received a defective product. |
| 20 | amazon emailed me that my laptop charger delivery date has been delayed. i ordered it sept 26th with two day shipping. | i'm sorry for the delay! which carrier is assigned to your order? does the tracking on their website give any clarity? mv | Thank you. I hope you are fine. | You can cancel your order by sending it back. |
| 21 | why pay for prime and not get your package in 2 days?? come on.. | two-day shipping refers to transit time, in business days, once shipped. have we missed your confirmed delivery date? am | Sorry for the inconvenience. | How about if you can get a refund from the seller? |
| 22 | hi there, can someone else collect my dpd parcel in your winterstoke road store if they take my id and reference number? | hi meg, that shouldn't be a problem, but to double check, i'd recommend giving the store a call directly on 0117 966 3064. danny | Thank you for your help. | We will be happy to help. |
| 23 | how do i get a refund on shipping? ordered overnight shipping and it took an extra few days! thanks for your help! | we're here to help, sorry for the wait! have we missed our expected delivery date provided here: the url ? please, let us know! je | Yes, I am trying to get refunds on your order. Can you email me your refund and receipt? | i'm sorry i'm not able to send the receipt. |
| 24 | had e-mail from amazonshipping attempted to deliver and failed. sitting in my office next to the front door! no delivery attempt! | i'm sorry to hear about the problem with this delivery! for further assistance, please use this: the url th | Thanks! | Thank you for your help. |
def evaluate_qa(predictions, references):
rouge = evaluate.load('rouge')
original_model_results = rouge.compute(
predictions=predictions,
references=references,
use_aggregator=True,
use_stemmer=True,
)
return original_model_results
peft_model_dict = evaluate_qa(all_models_response['peft_model'], all_models_response['human'])
original_model_dict = evaluate_qa(all_models_response['original_model'], all_models_response['human'])
keys = list(peft_model_dict.keys())
values_peft_model = list(peft_model_dict.values())
values_original_model = list(original_model_dict.values())
fig = go.Figure()
fig.add_trace(go.Bar(x=keys, y=values_peft_model, name='PEFT Model'))
fig.add_trace(go.Bar(x=keys, y=values_original_model, name='Original Model'))
fig.update_layout(title='Comparison of PEFT Model and Original Model by ROUGE metric',
xaxis_title='ROUGE Metric',
yaxis_title='ROUGE Score',
barmode='group')
fig.show()
What does the UI look like?
Ways to improve the model through further iteration:
1) To begin, leveraging all available data for fine-tuning the model would be the primary step to explore, as it can potentially lead to improvements given the availability of more resources.
2) The cleaning process still requires refinement to enhance the data quality and overall model performance.
3) An essential improvement would involve enabling the model to have contextual memory by incorporating the previous discussions between customers and the support team. This can be achieved by injecting the conversation history into the model through in-context learning, allowing it to provide more informed responses.
4) For in-context learning, the quality of prompt engineering plays a critical role. Enhancing the prompt by incorporating more relevant information and diverse examples will be beneficial, particularly for achieving more accurate few-shot inference.
5) The model selection process for this experiment requires a thorough investigation. It is crucial to determine whether T5 is the ideal model for this specific task and dataset or if alternative architectures such as BERT, RoBERTA, or GPT, or even Llama1&2 which are commonly used for question-answering tasks, could yield better results.
6) Both fine-tuning and in-context learning involve tunable parameters that can significantly impact the model's efficiency and effectiveness. Properly tuning these parameters is necessary to maximize the model's overall performance.
Implementation productionisation concerns